-
Notifications
You must be signed in to change notification settings - Fork 241
CI: run cuda.bindings examples on Linux and Windows #1517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
Analysis: - examples were invoked via `python -m pytest` from within `cuda_bindings` so the repo checkout was on sys.path and imports resolved to the source tree - `setuptools_scm` generates `cuda/bindings/_version.py` only in the built wheel, so the source tree lacks this file and `from cuda.bindings._version import __version__` fails during example collection - running `pytest` via the installed entrypoint avoids CWD precedence and keeps imports coming from the installed wheel, which includes the generated version file Change: - switch Linux and Windows example steps to call `pytest` entrypoint
|
/ok to test |
mdboom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only gets the example tests running in CI, not all the other ways that tests get run (calling pytest directly, or pixi run test etc.) It would be preferable to do this at a higher level -- maybe it's possible to make a symlink from cuda_bindings/tests/ to cuda_bindings/examples -- so that they will be included even for local development.
…irely under cuda_bindings/examples/
e34a864 to
7fa3f76
Compare
|
/ok to test |
Let's follow what I set up in cuda-core. All cuda-core examples are run as part of the regular tests, either locally or in the CI. |
|
@rwgk for reproducing the bug, try replacing cuda-python/ci/test-matrix.yml Line 33 in 78ade5c
with 3.14t so that we use free-threading Python.
|
CC @rparolin since this is about priorities: This is turning into a bigger project than expected. What should I do? I agree the cuda-core approach is much better than what we have now, but this PR provides immediate CI coverage without blocking that direction.
My preference: Merge this PR and create a new issue for future proper prioritization: Rework organization of cuda_bindings/examples |
|
/ok to test |
Done: commit dbfa3db |
|
Here is our reproducer 🙂 |
Awesome! I'll work on taking care of that now, to get the nvbug out of limbo. |
Keep pointer arrays alive through launches to avoid free-threaded Python misaligned-address failures caused by temporary argument buffers.
|
/ok to test |
|
Wow, so many unrelated flakes! Five, I looked at all of them. The one we're most interested in just says "This job failed" with a big X, it didn't even start up. Currently there is no job running, only 34 queued. I'll cancel and rerun. Nothing lost, but hopefully we'll get more lucky with the infrastructure. |
Closes #697
This PR is motivated by nvbug 5808967 / #1525 — currently there is no automatic testing for the cuda_bindings/examples at all, therefore it is possible that the QA team is side-tracked unnecessarily by failures that we can discover in the CI here automatically.
This PR enables running the cuda_bindings/examples in wheel-based test environments, which is both a gain and a simplification (see changes in
cuda_bindings/examples/common/common.py).Non-goal for this PR: structural changes to run the examples in various environments or in different ways.
Note that
scripts/run_tests.shruns the examples by default. This PR makesscripts/run_tests.shsucceed in local wheel-based environments (except for one unrelated failure in cuda_core, please ignore for the purpose of this PR):(no output)
For completeness: I was hoping that our CI reproduces the Python 3.14t failure reported under nvbug 5808967, but it does not. But at least we know now, which helps in the search for the root cause.